- Drivers?
- What is under Managerial/corporate control?
- Simple vs. complicated?
November 18, 2019
Enterprise Industries, owners of Fresh Detergent, want to predict demand for their product. In this case, the product is an extra large bottle of Fresh liquid detergent. Given a model for demand, Enterprise can:
Four indicators for 30 sales periods (4 weeks):
## Fresh.Demand Fresh.Price Industry.Price Advertising.Spending ## 1 7.38 3.85 3.80 5.50 ## 2 8.51 3.75 4.00 6.75 ## 3 9.52 3.70 4.30 7.25 ## 4 7.50 3.70 3.70 5.50 ## 5 9.33 3.60 3.85 7.00 ## 6 8.28 3.60 3.80 6.50 ## 7 8.75 3.60 3.75 6.75 ## 8 7.87 3.80 3.85 5.25 ## 9 7.10 3.80 3.65 5.25 ## 10 8.00 3.85 4.00 6.00 ## 11 7.89 3.90 4.10 6.50 ## 12 8.15 3.90 4.00 6.25 ## 13 9.10 3.70 4.10 7.00 ## 14 8.86 3.75 4.20 6.90 ## 15 8.90 3.75 4.10 6.80 ## 16 8.87 3.80 4.10 6.80 ## 17 9.26 3.70 4.20 7.10 ## 18 9.00 3.80 4.30 7.00 ## 19 8.75 3.70 4.10 6.80 ## 20 7.95 3.80 3.75 6.50
| Mean | Std. Dev. | Minimum | Maximum | Atoms | |
|---|---|---|---|---|---|
| Fresh.Demand | 8.38 | 0.68 | 7.10 | 9.52 | 26.00 |
| Fresh.Price | 3.73 | 0.09 | 3.55 | 3.90 | 8.00 |
| Industry.Price | 3.95 | 0.22 | 3.65 | 4.30 | 11.00 |
| Advertising.Spending | 6.45 | 0.57 | 5.25 | 7.25 | 13.00 |
| Fresh.Demand | Fresh.Price | Industry.Price | Advertising.Spending | |
|---|---|---|---|---|
| Fresh.Demand | 1.00 | -0.47 | 0.74 | 0.88 |
| Fresh.Price | -0.47 | 1.00 | 0.08 | -0.47 |
| Industry.Price | 0.74 | 0.08 | 1.00 | 0.60 |
| Advertising.Spending | 0.88 | -0.47 | 0.60 | 1.00 |
Is the Industry More or Less Expensive?
This is industry minus fresh.
# Is the Industry More or Less Expensive? This is industry minus fresh. t.test(fresh.data$Price.Difference)
## ## One Sample t-test ## ## data: fresh.data$Price.Difference ## t = 5.1383, df = 29, p-value = 1.727e-05 ## alternative hypothesis: true mean is not equal to 0 ## 95 percent confidence interval: ## 0.1284192 0.2982474 ## sample estimates: ## mean of x ## 0.2133333
Yes, Fresh is $0.13 to $0.30 cheaper.
library(patchwork) fresh.data$obs <- seq(1:30) ggplot(fresh.data) + aes(x=obs) + geom_line(aes(y=Fresh.Price), color = "blue", alpha = 0.2) + geom_line(aes(y=Industry.Price), color = "red", alpha = 0.2) + labs(title="Prices", x = "Period", y="Fresh/Blue Industry/Red") + ggplot(fresh.data, aes(x=obs, y=Price.Difference)) + geom_line() + labs(title="Price Differences", x = "Period")
Let’s have a look at the 3-D.
Requires(rgl) Warnings apply.
plot3d(z=fresh.data$Fresh.Demand, y=fresh.data$Advertising.Spending, x=fresh.data$Price.Difference, surface=FALSE, residuals=TRUE, bg="white", axis.scales=TRUE, grid=TRUE, ellipsoid=FALSE)
| Dependent variable: | |
| Fresh.Demand | |
| Fresh.Price | -2.358*** |
| (0.638) | |
| Industry.Price | 1.612*** |
| (0.295) | |
| Advertising.Spending | 0.501*** |
| (0.126) | |
| Constant | 7.589*** |
| (2.445) | |
| Observations | 30 |
| R2 | 0.894 |
| Adjusted R2 | 0.881 |
| Residual Std. Error | 0.235 (df = 26) |
| F Statistic | 72.797*** (df = 3; 26) |
| Note: | p<0.1; p<0.05; p<0.01 |
Conforms to intuition:
Constructing na"ive confidence intervals:
How could we test this?
| Dependent variable: | ||
| Fresh.Demand | ||
| (1) | (2) | |
| Fresh.Price | -2.358*** | |
| (0.638) | ||
| Industry.Price | 1.612*** | |
| (0.295) | ||
| Price.Difference | 1.588*** | |
| (0.299) | ||
| Advertising.Spending | 0.501*** | 0.563*** |
| (0.126) | (0.119) | |
| Constant | 7.589*** | 4.407*** |
| (2.445) | (0.722) | |
| Observations | 30 | 30 |
| R2 | 0.894 | 0.886 |
| Adjusted R2 | 0.881 | 0.878 |
| Residual Std. Error | 0.235 (df = 26) | 0.238 (df = 27) |
| F Statistic | 72.797*** (df = 3; 26) | 104.967*** (df = 2; 27) |
| Note: | p<0.1; p<0.05; p<0.01 | |
| Res.Df | RSS | Df | Sum of Sq | F | Pr(>F) | |
|---|---|---|---|---|---|---|
| 1 | 27 | 1.53 | ||||
| 2 | 26 | 1.43 | 1 | 0.10 | 1.85 | 0.1855 |
Let’s solve for F in terms of r-squared.
What is the difference in r-squared across the two models?
0.007569.
What is the average unexplained variance for the biggest model? 0.0043829
Which yields the following F. 1.8497987
ggplot(fresh.data, aes(y=Advertising.Spending, x=Price.Difference, size=Fresh.Demand)) + geom_point() + labs(title="The Explainers")
| Dependent variable: | |||
| Fresh.Demand | |||
| (1) | (2) | (3) | |
| Fresh.Price | -2.358*** | ||
| (0.638) | |||
| Industry.Price | 1.612*** | ||
| (0.295) | |||
| Price.Difference | 1.588*** | 1.307*** | |
| (0.299) | (0.304) | ||
| Advertising.Spending | 0.501*** | 0.563*** | -3.696* |
| (0.126) | (0.119) | (1.850) | |
| I(Advertising.Spending2) | 0.349** | ||
| (0.151) | |||
| Constant | 7.589*** | 4.407*** | 17.324*** |
| (2.445) | (0.722) | (5.641) | |
| Observations | 30 | 30 | 30 |
| R2 | 0.894 | 0.886 | 0.905 |
| Adjusted R2 | 0.881 | 0.878 | 0.894 |
| Residual Std. Error | 0.235 (df = 26) | 0.238 (df = 27) | 0.221 (df = 26) |
| F Statistic | 72.797*** (df = 3; 26) | 104.967*** (df = 2; 27) | 82.941*** (df = 3; 26) |
| Note: | p<0.1; p<0.05; p<0.01 | ||
par(mfrow=c(2,2)) qqnorm(fresh.model.diff$residuals, main="QQ-Normal: Linear Ad.Spending", datax=TRUE) qqnorm(fresh.model.sq$residuals, main="QQ-Normal: Quadratic Ad.Spending", datax=TRUE) plot(fresh.data$Price.Difference,fresh.model.sq$residuals, xlab="Price.Difference") plot(fresh.data$Price.Difference,fresh.model.diff$residuals, xlab="Price.Difference")
Once we decide on a model, we can come up with at least two very valuable quantities.
Let’s characterize the in choosing among these models.
We will take regression models in three directions.
Richening the set of \(X\) with more and more variation in the types of predictors. This will require paying attention the interrelationships among elements of \(X\).
Automating feature selection using defined criteria.
Expanding the data types we model as outcome to binary choices.